Construction of an Idiom Corpus and its Application to Idiom Identification based on WSD Incorporating Idiom-Specific Features
نویسندگان
چکیده
Some phrases can be interpreted either idiomatically (figuratively) or literally in context, and the precise identification of idioms is indispensable for full-fledged natural language processing (NLP). To this end, we have constructed an idiom corpus for Japanese. This paper reports on the corpus and the results of an idiom identification experiment using the corpus. The corpus targets 146 ambiguous idioms, and consists of 102,846 sentences, each of which is annotated with a literal/idiom label. For idiom identification, we targeted 90 out of the 146 idioms and adopted a word sense disambiguation (WSD) method using both commonWSD features and idiomspecific features. The corpus and the experiment are the largest of their kind, as far as we know. As a result, we found that a standard supervised WSD method works well for the idiom identification and achieved an accuracy of 89.25% and 88.86% with/without idiomspecific features and that the most effective idiom-specific feature is the one involving the adjacency of idiom constituents.
منابع مشابه
Drawing a Line between Literal and Idiomatic Meanings Based on Supervised WSD
Hashimoto, Chikara and Kawahara, Daisuke. 2008. Drawing a Line between Literal and Idiomatic Meanings Based on Supervised WSD. Linguistic Research 25(2), 105-123. Some phrases can be interpreted either idiomatically (figuratively) or literally in context, and the precise identification of idioms is indispensable for full-fledged natural language processing (NLP). To this end, we have constructe...
متن کاملThe Comparative Effect of Using Idioms in Conversation and Paragraph Writing on EFL Learners’ Idiom Learning
This study investigated the comparative effect of teaching idiomatic expressions through practicing them in conversation and paragraph writing on intermediate EFL learners’ idiom learning. The participants were sorted out of a population of 134 intermediate students in Zabansara Language School in Khorramabad based on their scores on a Preliminary English Test (PET) and an idiom test piloted in...
متن کاملThe Performance of Iranian EFL Learners in Producing and Recognizing Idiom-Containing Sentences
This study aimed to investigate how Iranian EFL learners performed in producing sentences containing idioms and whether they had any problems in producing such sentences. This query, subsequently, raised the question of whether idioms influenced the participants’ grammaticality judgment on idiom-containing sentences. For this purpose, firstly, the writings of 24 learners were investigated for a...
متن کاملAutomatic Idiom Identification in Wiktionary
Online resources, such as Wiktionary, provide an accurate but incomplete source of idiomatic phrases. In this paper, we study the problem of automatically identifying idiomatic dictionary entries with such resources. We train an idiom classifier on a newly gathered corpus of over 60,000 Wiktionary multi-word definitions, incorporating features that model whether phrase meanings are constructed ...
متن کاملThe impact of Persian transfer on Kurd learners’ idiom comprehension: parts of body in focus
The present study aims to consider the linguistic influence of official standard Persian language on idiom comprehension focusing on parts of body comprehension of Kurdish EFL learners. Most of the Kurds have studied in schools in which the language spoken or written has been different from their mother tongue. The present study is based on data from 92 EFL learners whom Kurdish is their first ...
متن کامل